Week 10 - Day 3

Manual Model Building

Explanatory model - this is what we explored yesterday; checking for statistical significance, and adding to the model Continue adding variables until you’ve built the model. You can explain it and you know how your model is able to predict

Predictive model - black box models; strong predictive power, but low statistical evidence

Systematic approach to model building, but it is not an exact science; it is equally an art, and therefore domain knowledge is important.

library(tidyverse)
library(car)
library(modelr)
library(GGally)

Getting rid of the census variable

checking for missing values (which are then dropped up above)

summary(prestige_trim)
   education          income          women           prestige       type   
 Min.   : 6.380   Min.   :  611   Min.   : 0.000   Min.   :14.80   bc  :44  
 1st Qu.: 8.445   1st Qu.: 4106   1st Qu.: 3.592   1st Qu.:35.23   prof:31  
 Median :10.540   Median : 5930   Median :13.600   Median :43.60   wc  :23  
 Mean   :10.738   Mean   : 6798   Mean   :28.979   Mean   :46.83   NA's: 4  
 3rd Qu.:12.648   3rd Qu.: 8187   3rd Qu.:52.203   3rd Qu.:59.27            
 Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20            

Choosing first predictor

# prestige will be first predictor

prestige_trim %>%
  ggpairs(aes(color = type, alpha = 0.5))

# from these plots, it looks like education, income, and type

Looking at education first

mod1a <- lm(prestige ~ education, data = prestige_trim)

mod1a

Call:
lm(formula = prestige ~ education, data = prestige_trim)

Coefficients:
(Intercept)    education  
    -10.841        5.388  

Interpretation of this:

Prestige = -10.841 + 5.388 * education

Which means… -10.841 is where the line intercepets the y-axis. And then it increases every…..

summary(mod1a)

Call:
lm(formula = prestige ~ education, data = prestige_trim)

Residuals:
    Min      1Q  Median      3Q     Max 
-21.605  -6.151   0.366   6.565  17.540 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -10.8409     3.5285  -3.072  0.00276 ** 
education     5.3884     0.3168  17.006  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.578 on 96 degrees of freedom
Multiple R-squared:  0.7508,    Adjusted R-squared:  0.7482 
F-statistic: 289.2 on 1 and 96 DF,  p-value: < 2.2e-16

Diagnostic plot

par(mfrow = c(2, 2))

plot(mod1a)

Task

Adding a second predictor into the plot (using the residuals)

prestige_remain_resid %>%
  ggpairs(aes(color = type, alpha = 0.5))

 plot: [1,1] [======>----------------------------------------------------------------------------------------------------]  6% est: 0s 
 plot: [1,2] [============>----------------------------------------------------------------------------------------------] 12% est: 0s 
 plot: [1,3] [===================>---------------------------------------------------------------------------------------] 19% est: 0s 
 plot: [1,4] [==========================>--------------------------------------------------------------------------------] 25% est: 0s 
 plot: [2,1] [================================>--------------------------------------------------------------------------] 31% est: 0s 
 plot: [2,2] [=======================================>-------------------------------------------------------------------] 38% est: 0s 
 plot: [2,3] [==============================================>------------------------------------------------------------] 44% est: 0s 
 plot: [2,4] [=====================================================>-----------------------------------------------------] 50% est: 0s 
 plot: [3,1] [===========================================================>-----------------------------------------------] 56% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [3,2] [==================================================================>----------------------------------------] 62% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [3,3] [=========================================================================>---------------------------------] 69% est: 0s 
 plot: [3,4] [===============================================================================>---------------------------] 75% est: 0s 
 plot: [4,1] [======================================================================================>--------------------] 81% est: 0s 
 plot: [4,2] [=============================================================================================>-------------] 88% est: 0s 
 plot: [4,3] [===================================================================================================>-------] 94% est: 0s `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

 plot: [4,4] [===========================================================================================================]100% est: 0s 
                                                                                                                                       

Second model development; adding in residual or income

summary(mod2a)

Call:
lm(formula = prestige ~ education + income, data = prestige_trim)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.9367  -4.8881   0.0116   4.9690  15.9280 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -7.6210352  3.1162309  -2.446   0.0163 *  
education    4.2921076  0.3360645  12.772  < 2e-16 ***
income       0.0012415  0.0002185   5.682 1.45e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.45 on 95 degrees of freedom
Multiple R-squared:  0.814, Adjusted R-squared:  0.8101 
F-statistic: 207.9 on 2 and 95 DF,  p-value: < 2.2e-16
mod2b <- lm(prestige ~ education + type, data = prestige_trim)

summary(mod2b)

Call:
lm(formula = prestige ~ education + type, data = prestige_trim)

Residuals:
    Min      1Q  Median      3Q     Max 
-19.410  -5.508   1.360   5.694  17.171 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -2.6982     5.7361  -0.470   0.6392    
education     4.5728     0.6716   6.809 9.16e-10 ***
typeprof      6.1424     4.2590   1.442   0.1526    
typewc       -5.4585     2.6907  -2.029   0.0453 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.814 on 94 degrees of freedom
Multiple R-squared:  0.7975,    Adjusted R-squared:  0.791 
F-statistic: 123.4 on 3 and 94 DF,  p-value: < 2.2e-16

The p-value for type is high; for one dummy variable it is abov 0.05, and for one, below. To check whether it is as a whole above or below, we can use the function anova.

anova(mod1a, mod2b)
Analysis of Variance Table

Model 1: prestige ~ education
Model 2: prestige ~ education + type
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     96 7064.4                                  
2     94 5740.0  2    1324.4 10.844 5.787e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This shows that it IS statistically significant, so it could be used. However, we noted that income was a better predictor, so we will proceed with this path.

prestige_remain_resid <- prestige_trim %>%
  add_residuals(mod2a) %>%
  select(-c("prestige", "education", "income"))

prestige_remain_resid %>%
  ggpairs(aes(colour = type, alpha = 0.5))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There is still some variation in type, so we wil use this for our third predictor.

mod3a <- lm(prestige ~ education + income + type, data = prestige_trim)

summary(mod3a)

Call:
lm(formula = prestige ~ education + income + type, data = prestige_trim)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.9529  -4.4486   0.1678   5.0566  18.6320 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.6229292  5.2275255  -0.119    0.905    
education    3.6731661  0.6405016   5.735 1.21e-07 ***
income       0.0010132  0.0002209   4.586 1.40e-05 ***
typeprof     6.0389707  3.8668551   1.562    0.122    
typewc      -2.7372307  2.5139324  -1.089    0.279    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.095 on 93 degrees of freedom
Multiple R-squared:  0.8349,    Adjusted R-squared:  0.8278 
F-statistic: 117.5 on 4 and 93 DF,  p-value: < 2.2e-16

While the two type variables are not significant, there is one which is hidden, so it is important to run the anova anways

anova(mod2a, mod3a)
Analysis of Variance Table

Model 1: prestige ~ education + income
Model 2: prestige ~ education + income + type
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     95 5272.4                                
2     93 4681.3  2    591.16 5.8721 0.003966 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

** BREAK **

Interactions

You can only look at interactions if the variable is in the model.

Examining interactions

Education, and income - both are continuous

Adding some regression lines

This shows how the trend changes; as education increases, so does income (and the correlation begins to go downwards)

Looking at education and type

prestige_resid %>%
  ggplot() +
  aes(x = education, y = resid, color = type) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula 'y ~ x'

Income and type

prestige_resid %>%
  ggplot() +
  aes(x = income, y = resid, color = type) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula 'y ~ x'

Task Test all 3 interactions in your model seperately, and choose the best.

summary(mod4c)

Call:
lm(formula = prestige ~ education + income + type + income:type, 
    data = prestige_trim)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.8720  -4.8321   0.8534   4.1425  19.6710 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -6.7272633  4.9515480  -1.359   0.1776    
education        3.0396961  0.6003699   5.063 2.14e-06 ***
income           0.0031344  0.0005215   6.010 3.79e-08 ***
typeprof        25.1723873  5.4669586   4.604 1.34e-05 ***
typewc           7.1375093  5.2898177   1.349   0.1806    
income:typeprof -0.0025102  0.0005530  -4.539 1.72e-05 ***
income:typewc   -0.0014856  0.0008720  -1.704   0.0919 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.455 on 91 degrees of freedom
Multiple R-squared:  0.8663,    Adjusted R-squared:  0.8574 
F-statistic: 98.23 on 6 and 91 DF,  p-value: < 2.2e-16

Imagine a question is set by manager, who asks what has the most impact / where should we focus our efforts? This is how to determine this. Relative importance analysis

lmg - lindemann, merenda and gold method

calc.relimp(mod4c, type = "lmg", rela = T)
Response variable: prestige 
Total response variance: 292.2358 
Analysis based on 98 observations 

6 Regressors: 
Some regressors combined in groups: 
        Group  type : typeprof typewc 
        Group  income:type : income:typeprof income:typewc 

 Relative importance of 4 (groups of) regressors assessed: 
 type income:type education income 
 
Proportion of variance explained by model: 86.63%
Metrics are normalized to sum to 100% (rela=TRUE). 

Relative importance metrics: 

                   lmg
type        0.39609310
income:type 0.04222622
education   0.30489921
income      0.25678147

Average coefficients for different model sizes: 

                      1group      2groups      3groups      4groups
education        5.388407674  4.432450330  3.673166052  3.039696088
income           0.002843574  0.001321366  0.002518339  0.003134410
typeprof        32.321114370 15.598958947 25.528996354 25.172387319
typewc           6.716205534  0.854330301  8.121753937  7.137509272
income:typeprof          NaN          NaN -0.003178285 -0.002510174
income:typewc            NaN          NaN -0.002171217 -0.001485560

The variable with the highest metric is type (even though we only added it / found out about it 3rd) - it has 40%. So this would provide a good argument for a business to give good weight to this metric

library(lm.beta)

mod4c_beta <- lm.beta(mod4c)
summary(mod4c_beta)

Call:
lm(formula = prestige ~ education + income + type + income:type, 
    data = prestige_trim)

Residuals:
     Min       1Q   Median       3Q      Max 
-12.8720  -4.8321   0.8534   4.1425  19.6710 

Coefficients:
                  Estimate Standardized Std. Error t value Pr(>|t|)    
(Intercept)     -6.7272633    0.0000000  4.9515480  -1.359   0.1776    
education        3.0396961    0.4887966  0.6003699   5.063 2.14e-06 ***
income           0.0031344    0.7752428  0.0005215   6.010 3.79e-08 ***
typeprof        25.1723873    0.6882988  5.4669586   4.604 1.34e-05 ***
typewc           7.1375093    0.1778589  5.2898177   1.349   0.1806    
income:typeprof -0.0025102   -0.8493433  0.0005530  -4.539 1.72e-05 ***
income:typewc   -0.0014856   -0.2036043  0.0008720  -1.704   0.0919 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.455 on 91 degrees of freedom
Multiple R-squared:  0.8663,    Adjusted R-squared:  0.8574 
F-statistic: 98.23 on 6 and 91 DF,  p-value: < 2.2e-16

** LUNCH BREAK **

Dimensionality Reduction

The curse of dimensionality

the more variables you add, the more difficult you make your model / challenges your model faces

Ways to tackle this problem

Variable reduction - we’ve already done this, by selecting the relevant / best variables

Dimensionality reduction - reducing variables into components, e.g. 40% of one, 40% of another, and 20% of a 3rd variable

variable reduction - there are 3 ways to do this: - filter. assessing the variable by some determinate, and filtering out if they don’t meet it - wrapper. checking for correlation, checking residual, and then deciding. Forward selection is the method used in the first lesson (starting with nothing, and then adding in) - embedded.

Dimensionality Reduction

Principal Component Analysis

Because it uses a matrix, you have to do variable engineering first (you can’t use categorical variables with this)

Principal Compenent Analysis Task

cleaning dataset

cars_data <- cars_data %>%
  select(-c("vs", "am"))
Error in select(., -c("vs", "am")) : unused argument (-c("vs", "am"))

creating the PCA

summary(cars_pca)
Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7    PC8     PC9
Standard deviation     2.3782 1.4429 0.71008 0.51481 0.42797 0.35184 0.32413 0.2419 0.14896
Proportion of Variance 0.6284 0.2313 0.05602 0.02945 0.02035 0.01375 0.01167 0.0065 0.00247
Cumulative Proportion  0.6284 0.8598 0.91581 0.94525 0.96560 0.97936 0.99103 0.9975 1.00000

fviz_pca_ind(cars_pca,
             repel = TRUE)
Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing max.overlaps

Creating a plot which is a mix of the two previous plots

fviz_pca_biplot(cars_pca,
                repel = TRUE,
                col.var = "#00008b",
                col.ind = "#d3d3d3")
Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing max.overlaps

summary(unscaled_pca)
Importance of components:
                           PC1      PC2     PC3     PC4     PC5     PC6    PC7    PC8    PC9
Standard deviation     136.532 38.14735 3.06642 1.27492 0.90474 0.64734 0.3054 0.2859 0.2159
Proportion of Variance   0.927  0.07237 0.00047 0.00008 0.00004 0.00002 0.0000 0.0000 0.0000
Cumulative Proportion    0.927  0.99938 0.99985 0.99993 0.99997 0.99999 1.0000 1.0000 1.0000
LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKV2VlayAxMCAtIERheSAzCgojIyBNYW51YWwgTW9kZWwgQnVpbGRpbmcKCkV4cGxhbmF0b3J5IG1vZGVsIC0gdGhpcyBpcyB3aGF0IHdlIGV4cGxvcmVkIHllc3RlcmRheTsgY2hlY2tpbmcgZm9yIHN0YXRpc3RpY2FsIHNpZ25pZmljYW5jZSwgYW5kIGFkZGluZyB0byB0aGUgbW9kZWwKQ29udGludWUgYWRkaW5nIHZhcmlhYmxlcyB1bnRpbCB5b3UndmUgYnVpbHQgdGhlIG1vZGVsLiBZb3UgY2FuIGV4cGxhaW4gaXQgYW5kIHlvdSBrbm93IGhvdyB5b3VyIG1vZGVsIGlzIGFibGUgdG8gcHJlZGljdAoKUHJlZGljdGl2ZSBtb2RlbCAtIGJsYWNrIGJveCBtb2RlbHM7IHN0cm9uZyBwcmVkaWN0aXZlIHBvd2VyLCBidXQgbG93IHN0YXRpc3RpY2FsIGV2aWRlbmNlCgpTeXN0ZW1hdGljIGFwcHJvYWNoIHRvIG1vZGVsIGJ1aWxkaW5nLCBidXQgaXQgaXMgbm90IGFuIGV4YWN0IHNjaWVuY2U7IGl0IGlzIGVxdWFsbHkgYW4gYXJ0LCBhbmQgdGhlcmVmb3JlIGRvbWFpbiBrbm93bGVkZ2UgaXMgaW1wb3J0YW50LgoKCmBgYHtyfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShjYXIpCmxpYnJhcnkobW9kZWxyKQpsaWJyYXJ5KEdHYWxseSkKYGBgCgpgYGB7cn0KUHJlc3RpZ2UKYGBgCgpHZXR0aW5nIHJpZCBvZiB0aGUgY2Vuc3VzIHZhcmlhYmxlCgpgYGB7cn0KcHJlc3RpZ2VfdHJpbSA8LSBQcmVzdGlnZSAlPiUKICBzZWxlY3QoLWNlbnN1cykgJT4lCiAgZHJvcF9uYSgpCmBgYAoKY2hlY2tpbmcgZm9yIG1pc3NpbmcgdmFsdWVzICh3aGljaCBhcmUgdGhlbiBkcm9wcGVkIHVwIGFib3ZlKQpgYGB7cn0Kc3VtbWFyeShwcmVzdGlnZV90cmltKQpgYGAKCkNob29zaW5nIGZpcnN0IHByZWRpY3RvcgpgYGB7cn0KIyBwcmVzdGlnZSB3aWxsIGJlIGZpcnN0IHByZWRpY3RvcgoKcHJlc3RpZ2VfdHJpbSAlPiUKICBnZ3BhaXJzKGFlcyhjb2xvciA9IHR5cGUsIGFscGhhID0gMC41KSkKCiMgZnJvbSB0aGVzZSBwbG90cywgaXQgbG9va3MgbGlrZSBlZHVjYXRpb24sIGluY29tZSwgYW5kIHR5cGUKYGBgCgpMb29raW5nIGF0IGVkdWNhdGlvbiBmaXJzdApgYGB7cn0KbW9kMWEgPC0gbG0ocHJlc3RpZ2UgfiBlZHVjYXRpb24sIGRhdGEgPSBwcmVzdGlnZV90cmltKQoKbW9kMWEKYGBgCkludGVycHJldGF0aW9uIG9mIHRoaXM6CgpQcmVzdGlnZSA9IC0xMC44NDEgKyA1LjM4OCAqIGVkdWNhdGlvbgoKV2hpY2ggbWVhbnMuLi4gCi0xMC44NDEgaXMgd2hlcmUgdGhlIGxpbmUgaW50ZXJjZXBldHMgdGhlIHktYXhpcy4gQW5kIHRoZW4gaXQgaW5jcmVhc2VzIGV2ZXJ5Li4uLi4KCmBgYHtyfQpzdW1tYXJ5KG1vZDFhKQpgYGAKCkRpYWdub3N0aWMgcGxvdApgYGB7cn0KcGFyKG1mcm93ID0gYygyLCAyKSkKCnBsb3QobW9kMWEpCmBgYAoKVGFzawpgYGB7cn0KbW9kMWIgPC0gbG0ocHJlc3RpZ2UgfiB0eXBlLCBkYXRhID0gcHJlc3RpZ2VfdHJpbSkKCm1vZDFiCgpzdW1tYXJ5KG1vZDFiKQoKcGFyKG1mcm93ID0gYygyLCAyKSkKCnBsb3QobW9kMWIpCiMgdGhlIHBsb3RzIGxvb2sgb2RkIGJlY2F1c2UgdGhlIHZhcmlhYmxlIGlzIGNhdGVnb3JpY2FsIC0gdGhpcyB3b3VsZCBjaGFuZ2UgaWYgdGhlcmUgd2FzIGEgY29udGludW91cyB2YXJpYWJsZSBpbgoKYGBgCgpBZGRpbmcgYSBzZWNvbmQgcHJlZGljdG9yIGludG8gdGhlIHBsb3QgKHVzaW5nIHRoZSByZXNpZHVhbHMpCmBgYHtyfQpwcmVzdGlnZV9yZW1haW5fcmVzaWQgPC0gcHJlc3RpZ2VfdHJpbSAlPiUKICBhZGRfcmVzaWR1YWxzKG1vZDFhKSAlPiUKICBzZWxlY3QoLWMoInByZXN0aWdlIiwgImVkdWNhdGlvbiIpKQoKcHJlc3RpZ2VfcmVtYWluX3Jlc2lkICU+JQogIGdncGFpcnMoYWVzKGNvbG9yID0gdHlwZSwgYWxwaGEgPSAwLjUpKQpgYGAKClNlY29uZCBtb2RlbCBkZXZlbG9wbWVudDsgYWRkaW5nIGluIHJlc2lkdWFsIG9yIGluY29tZQoKYGBge3J9Cm1vZDJhIDwtIGxtKHByZXN0aWdlIH4gZWR1Y2F0aW9uICsgaW5jb21lLCBkYXRhID0gcHJlc3RpZ2VfdHJpbSkKCnN1bW1hcnkobW9kMmEpCmBgYAoKYGBge3J9Cm1vZDJiIDwtIGxtKHByZXN0aWdlIH4gZWR1Y2F0aW9uICsgdHlwZSwgZGF0YSA9IHByZXN0aWdlX3RyaW0pCgpzdW1tYXJ5KG1vZDJiKQpgYGAKVGhlIHAtdmFsdWUgZm9yIHR5cGUgaXMgaGlnaDsgZm9yIG9uZSBkdW1teSB2YXJpYWJsZSBpdCBpcyBhYm92IDAuMDUsIGFuZCBmb3Igb25lLCBiZWxvdy4KVG8gY2hlY2sgd2hldGhlciBpdCBpcyBhcyBhIHdob2xlIGFib3ZlIG9yIGJlbG93LCB3ZSBjYW4gdXNlIHRoZSBmdW5jdGlvbiBhbm92YS4KCgpgYGB7cn0KYW5vdmEobW9kMWEsIG1vZDJiKQpgYGAKVGhpcyBzaG93cyB0aGF0IGl0IElTIHN0YXRpc3RpY2FsbHkgc2lnbmlmaWNhbnQsIHNvIGl0IGNvdWxkIGJlIHVzZWQuIApIb3dldmVyLCB3ZSBub3RlZCB0aGF0IGluY29tZSB3YXMgYSBiZXR0ZXIgcHJlZGljdG9yLCBzbyB3ZSB3aWxsIHByb2NlZWQgd2l0aCB0aGlzIHBhdGguCgoKYGBge3J9CnByZXN0aWdlX3JlbWFpbl9yZXNpZCA8LSBwcmVzdGlnZV90cmltICU+JQogIGFkZF9yZXNpZHVhbHMobW9kMmEpICU+JQogIHNlbGVjdCgtYygicHJlc3RpZ2UiLCAiZWR1Y2F0aW9uIiwgImluY29tZSIpKQoKcHJlc3RpZ2VfcmVtYWluX3Jlc2lkICU+JQogIGdncGFpcnMoYWVzKGNvbG91ciA9IHR5cGUsIGFscGhhID0gMC41KSkKYGBgCgpUaGVyZSBpcyBzdGlsbCBzb21lIHZhcmlhdGlvbiBpbiB0eXBlLCBzbyB3ZSB3aWwgdXNlIHRoaXMgZm9yIG91ciB0aGlyZCBwcmVkaWN0b3IuCgpgYGB7cn0KbW9kM2EgPC0gbG0ocHJlc3RpZ2UgfiBlZHVjYXRpb24gKyBpbmNvbWUgKyB0eXBlLCBkYXRhID0gcHJlc3RpZ2VfdHJpbSkKCnN1bW1hcnkobW9kM2EpCmBgYApXaGlsZSB0aGUgdHdvIHR5cGUgdmFyaWFibGVzIGFyZSBub3Qgc2lnbmlmaWNhbnQsIHRoZXJlIGlzIG9uZSB3aGljaCBpcyBoaWRkZW4sIHNvIGl0IGlzCmltcG9ydGFudCB0byBydW4gdGhlIGFub3ZhIGFud2F5cwoKYGBge3J9CmFub3ZhKG1vZDJhLCBtb2QzYSkKYGBgCgoqKiBCUkVBSyAqKgoKSW50ZXJhY3Rpb25zCgpZb3UgY2FuIG9ubHkgbG9vayBhdCBpbnRlcmFjdGlvbnMgaWYgdGhlIHZhcmlhYmxlIGlzIGluIHRoZSBtb2RlbC4KCmBgYHtyfQpwcmVzdGlnZV9yZXNpZCA8LSBwcmVzdGlnZV90cmltICU+JQogIGFkZF9yZXNpZHVhbHMobW9kM2EpICU+JQogIHNlbGVjdCgtcHJlc3RpZ2UpCmBgYAoKRXhhbWluaW5nIGludGVyYWN0aW9ucwoKRWR1Y2F0aW9uLCBhbmQgaW5jb21lIC0gYm90aCBhcmUgY29udGludW91cwoKYGBge3J9CmNvcGxvdChyZXNpZCB+IGluY29tZSB8IGVkdWNhdGlvbiwgZGF0YSA9IHByZXN0aWdlX3Jlc2lkLCBjb2x1bW5zID0gNikKYGBgCgpBZGRpbmcgc29tZSByZWdyZXNzaW9uIGxpbmVzCgpgYGB7cn0KY29wbG90KHJlc2lkIH4gaW5jb21lIHwgZWR1Y2F0aW9uLAogICAgICAgcGFuZWwgPSBmdW5jdGlvbih4LHksIC4uLil7CiAgICAgICAgIHBvaW50cyh4LHkpCiAgICAgICAgIGFibGluZShsbSh5IH4geCksIGNvbCA9ICJibHVlIikKICAgICAgIH0sCiAgICAgICBkYXRhID0gcHJlc3RpZ2VfcmVzaWQsCiAgICAgICBjb2x1bW5zID0gNikKYGBgClRoaXMgc2hvd3MgaG93IHRoZSB0cmVuZCBjaGFuZ2VzOyBhcyBlZHVjYXRpb24gaW5jcmVhc2VzLCBzbyBkb2VzIGluY29tZSAoYW5kIHRoZSBjb3JyZWxhdGlvbiBiZWdpbnMgdG8gZ28gZG93bndhcmRzKQoKCkxvb2tpbmcgYXQgZWR1Y2F0aW9uIGFuZCB0eXBlCgpgYGB7cn0KcHJlc3RpZ2VfcmVzaWQgJT4lCiAgZ2dwbG90KCkgKwogIGFlcyh4ID0gZWR1Y2F0aW9uLCB5ID0gcmVzaWQsIGNvbG9yID0gdHlwZSkgKwogIGdlb21fcG9pbnQoKSArCiAgZ2VvbV9zbW9vdGgobWV0aG9kID0gImxtIiwgc2UgPSBGQUxTRSkKYGBgCgpJbmNvbWUgYW5kIHR5cGUKYGBge3J9CnByZXN0aWdlX3Jlc2lkICU+JQogIGdncGxvdCgpICsKICBhZXMoeCA9IGluY29tZSwgeSA9IHJlc2lkLCBjb2xvciA9IHR5cGUpICsKICBnZW9tX3BvaW50KCkgKwogIGdlb21fc21vb3RoKG1ldGhvZCA9ICJsbSIsIHNlID0gRkFMU0UpCmBgYAoKVGFzawpUZXN0IGFsbCAzIGludGVyYWN0aW9ucyBpbiB5b3VyIG1vZGVsIHNlcGVyYXRlbHksIGFuZCBjaG9vc2UgdGhlIGJlc3QuCgpgYGB7cn0KIyBlZHVjYXRpb24gYW5kIGluY29tZQoKbW9kNGEgPC0gbG0ocHJlc3RpZ2UgfiBlZHVjYXRpb24gKyBpbmNvbWUgKyB0eXBlICsgZWR1Y2F0aW9uOmluY29tZSwgZGF0YSA9IHByZXN0aWdlX3RyaW0pCgpzdW1tYXJ5KG1vZDRhKQoKIyBub3QgdGVycmlibGUsIGJ1dCBub3QgZ3JlYXQKCnBhcihtZnJvdyA9IGMoMiwgMikpCgpwbG90KG1vZDRhKQpgYGAKCmBgYHtyfQojIGVkdWNhdGlvbiBhbmQgdHlwZQoKbW9kNGIgPC0gbG0ocHJlc3RpZ2UgfiBlZHVjYXRpb24gKyBpbmNvbWUgKyB0eXBlICsgZWR1Y2F0aW9uOnR5cGUsIGRhdGEgPSBwcmVzdGlnZV90cmltKQoKc3VtbWFyeShtb2Q0YikKCiMgcm91Z2hseSB0aGUgc2FtZSBhcyB0aGUgb3RoZXIgb25lCgpwYXIobWZyb3cgPSBjKDIsIDIpKQoKcGxvdChtb2Q0YikKYGBgCgpgYGB7cn0KIyBlZHVjYXRpb24gYW5kIHR5cGUKCm1vZDRjIDwtIGxtKHByZXN0aWdlIH4gZWR1Y2F0aW9uICsgaW5jb21lICsgdHlwZSArIGluY29tZTp0eXBlLCBkYXRhID0gcHJlc3RpZ2VfdHJpbSkKCnN1bW1hcnkobW9kNGMpCgojIGhpZ2hlciB0aGEgdGhlIG90aGVyIHR3byAtIG9uZSBvZiB0aGVtIGlzIHN0YXQgc2lnIHRvbwoKcGFyKG1mcm93ID0gYygyLCAyKSkKCnBsb3QobW9kNGMpCgphbm92YShtb2QzYSwgbW9kNGMpCmBgYAoKSW1hZ2luZSBhIHF1ZXN0aW9uIGlzIHNldCBieSBtYW5hZ2VyLCB3aG8gYXNrcyB3aGF0IGhhcyB0aGUgbW9zdCBpbXBhY3QgLyB3aGVyZSBzaG91bGQKd2UgZm9jdXMgb3VyIGVmZm9ydHM/IFRoaXMgaXMgaG93IHRvIGRldGVybWluZSB0aGlzLgpSZWxhdGl2ZSBpbXBvcnRhbmNlIGFuYWx5c2lzCgpsbWcgLSBsaW5kZW1hbm4sIG1lcmVuZGEgYW5kIGdvbGQgbWV0aG9kCgpgYGB7cn0KbGlicmFyeShyZWxhaW1wbykKCmNhbGMucmVsaW1wKG1vZDRjLCB0eXBlID0gImxtZyIsIHJlbGEgPSBUKQpgYGAKVGhlIHZhcmlhYmxlIHdpdGggdGhlIGhpZ2hlc3QgbWV0cmljIGlzIHR5cGUgKGV2ZW4gdGhvdWdoIHdlIG9ubHkgYWRkZWQgaXQgLyBmb3VuZCBvdXQgYWJvdXQgaXQgM3JkKSAtIGl0IGhhcyA0MCUuIApTbyB0aGlzIHdvdWxkIHByb3ZpZGUgYSBnb29kIGFyZ3VtZW50IGZvciBhIGJ1c2luZXNzIHRvIGdpdmUgZ29vZCB3ZWlnaHQgdG8gdGhpcyBtZXRyaWMKCmBgYHtyfQojIGFub3RoZXIgd2F5IHRvIGFuc3dlciB0aGUgc2FtZSBxdWVzdGlvbiwgdXNpbmcgdGhlIGJldGEgY29lZmZlY2llbnQKbGlicmFyeShsbS5iZXRhKQoKbW9kNGNfYmV0YSA8LSBsbS5iZXRhKG1vZDRjKQpzdW1tYXJ5KG1vZDRjX2JldGEpCmBgYAoKKiogTFVOQ0ggQlJFQUsgKioKCiMjIERpbWVuc2lvbmFsaXR5IFJlZHVjdGlvbgoKVGhlIGN1cnNlIG9mIGRpbWVuc2lvbmFsaXR5Cgp0aGUgbW9yZSB2YXJpYWJsZXMgeW91IGFkZCwgdGhlIG1vcmUgZGlmZmljdWx0IHlvdSBtYWtlIHlvdXIgbW9kZWwgLyBjaGFsbGVuZ2VzIHlvdXIgbW9kZWwgZmFjZXMKCldheXMgdG8gdGFja2xlIHRoaXMgcHJvYmxlbQoKVmFyaWFibGUgcmVkdWN0aW9uCi0gd2UndmUgYWxyZWFkeSBkb25lIHRoaXMsIGJ5IHNlbGVjdGluZyB0aGUgcmVsZXZhbnQgLyBiZXN0IHZhcmlhYmxlcwoKRGltZW5zaW9uYWxpdHkgcmVkdWN0aW9uCi0gcmVkdWNpbmcgdmFyaWFibGVzIGludG8gY29tcG9uZW50cywgZS5nLiA0MCUgb2Ygb25lLCA0MCUgb2YgYW5vdGhlciwgYW5kIDIwJSBvZiBhIDNyZCB2YXJpYWJsZQoKdmFyaWFibGUgcmVkdWN0aW9uIC0gdGhlcmUgYXJlIDMgd2F5cyB0byBkbyB0aGlzOgotIGZpbHRlci4gYXNzZXNzaW5nIHRoZSB2YXJpYWJsZSBieSBzb21lIGRldGVybWluYXRlLCBhbmQgZmlsdGVyaW5nIG91dCBpZiB0aGV5IGRvbid0IG1lZXQgaXQKLSB3cmFwcGVyLiBjaGVja2luZyBmb3IgY29ycmVsYXRpb24sIGNoZWNraW5nIHJlc2lkdWFsLCBhbmQgdGhlbiBkZWNpZGluZy4gRm9yd2FyZCBzZWxlY3Rpb24gaXMgdGhlIG1ldGhvZCB1c2VkIGluIHRoZSBmaXJzdCBsZXNzb24gKHN0YXJ0aW5nIHdpdGggbm90aGluZywgYW5kIHRoZW4gYWRkaW5nIGluKQotIGVtYmVkZGVkLiAKCkRpbWVuc2lvbmFsaXR5IFJlZHVjdGlvbgoKUHJpbmNpcGFsIENvbXBvbmVudCBBbmFseXNpcwoKQmVjYXVzZSBpdCB1c2VzIGEgbWF0cml4LCB5b3UgaGF2ZSB0byBkbyB2YXJpYWJsZSBlbmdpbmVlcmluZyBmaXJzdCAoeW91IGNhbid0IHVzZSBjYXRlZ29yaWNhbCB2YXJpYWJsZXMgd2l0aCB0aGlzKQoKUHJpbmNpcGFsIENvbXBlbmVudCBBbmFseXNpcyBUYXNrCmBgYHtyfQpjYXJzX2RhdGEgPC0gbXRjYXJzCmBgYAoKY2xlYW5pbmcgZGF0YXNldApgYGB7cn0KY2Fyc19kYXRhIDwtIGNhcnNfZGF0YSAlPiUKICBkcGx5cjo6c2VsZWN0KC1jKCJ2cyIsICJhbSIpKQpgYGAKCmNyZWF0aW5nIHRoZSBQQ0EKYGBge3J9CmNhcnNfcGNhIDwtIHByY29tcChjYXJzX2RhdGEsIGNlbnRlciA9IFRSVUUsIHNjYWxlLiA9IFRSVUUpCiAgCgpzdW1tYXJ5KGNhcnNfcGNhKQpgYGAKCmBgYHtyfQpsaWJyYXJ5KGZhY3RvZXh0cmEpCgpmdml6X2VpZyhjYXJzX3BjYSkKCmBgYAoKCmBgYHtyfQpmdml6X3BjYV9pbmQoY2Fyc19wY2EsCiAgICAgICAgICAgICByZXBlbCA9IFRSVUUpCmBgYAoKCmBgYHtyfQpmdml6X3BjYV92YXIoY2Fyc19wY2EsCiAgICAgICAgICAgICBjb2wudmFyID0gImNvbnRyaWIiLAogICAgICAgICAgICAgIyBDb2xvciBieSBjb250cmlidXRpb25zIHRvIHRoZSBQQwogICAgICAgICAgICAgZ3JhZGllbnQuY29scyA9IGMoIiMwMEFGQkIiLCAiI0U3QjgwMCIsICIjRkM0RTA3IiksCiAgICAgICAgICAgICByZXBlbCA9IFRSVUUgICAgICMgQXZvaWQgdGV4dCBvdmVybGFwcGluZwogICAgICAgICAgICAgKQpgYGAKCkNyZWF0aW5nIGEgcGxvdCB3aGljaCBpcyBhIG1peCBvZiB0aGUgdHdvIHByZXZpb3VzIHBsb3RzCmBgYHtyfQpmdml6X3BjYV9iaXBsb3QoY2Fyc19wY2EsCiAgICAgICAgICAgICAgICByZXBlbCA9IFRSVUUsCiAgICAgICAgICAgICAgICBjb2wudmFyID0gIiMwMDAwOGIiLAogICAgICAgICAgICAgICAgY29sLmluZCA9ICIjZDNkM2QzIikKYGBgCgoKYGBge3J9CnVuc2NhbGVkX3BjYSA8LSBwcmNvbXAoY2Fyc19kYXRhLCBjZW50ZXIgPSBUUlVFLCBzY2FsZS4gPSBGQUxTRSkKc3VtbWFyeSh1bnNjYWxlZF9wY2EpCgoKIyBsb29raW5nIGF0IHRoZSAxc3QgUENBLCBpdCBhcHBlYXJzIHRoYXQgOTIlIG9mIG91ciBkYXRhIGlzIGV4cGxhaW5lZCBieSB0aGUgUENBLiAKIyB0aGlzIGRvZXNuJ3QgYXBwZWFyIHF1aXRlIHJpZ2h0LCBzbyBsb29raW5nIGF0IHRoZSBvcmlnaW5hbCBkYXRhc2V0LCBpdCBpcyBjbGVhciB0aGF0CiMgZGlzcCBhbmQgaHAgYXJlIG9uIGEgZGlmZmVyZW50IHNjYWxlIHRvIGRyYXQsIHd0LCBldGMgLSB0aGVyZWZvcmUgdGhlIHZhcmlhbmNlIHdpbGwKIyBiZSB0aHJvd24gb2ZmLgpgYGAKCmBgYHtyfQptdGNhcnMKYGBgCgoKCgoK